Scalable Co-clustering Algorithms
نویسندگان
چکیده
Co-clustering has been extensively used in varied applications because of its potential to discover latent local patterns that are otherwise unapparent by usual unsupervised algorithms such as k-means. Recently, a unified view of co-clustering algorithms, called Bregman co-clustering (BCC), provides a general framework that even contains several existing co-clustering algorithms, thus we expect to have more applications of this framework to varied data types. However, the amount of data collected from real-life application domains easily grows too big to fit in the main memory of a single processor machine. Accordingly, enhancing the scalability of BCC can be a critical challenge in practice. To address this and eventually enhance its potential for rapid deployment to wider applications with larger data, we parallelize all the twelve co-clustering algorithms in the BCC framework using message passing interface (MPI). In addition, we validate their scalability on eleven synthetic datasets as well as one real-life dataset, where we demonstrate their speedup performance in terms of varied parameter settings.
منابع مشابه
Scalable Ensemble Information-Theoretic Co-clustering for Massive Data
Co-clustering is effective for simultaneously clustering rows and columns of a data matrix. Yet different coclustering models usually produce very distinct results. In this paper, we propose a scalable algorithm to co-cluster massive, sparse and high dimensional data and combine individual clustering results to produce a better final result. Our algorithm is particularly suitable for distribute...
متن کاملخوشهبندی دادهها بر پایه شناسایی کلید
Clustering has been one of the main building blocks in the fields of machine learning and computer vision. Given a pair-wise distance measure, it is challenging to find a proper way to identify a subset of representative exemplars and its associated cluster structures. Recent trend on big data analysis poses a more demanding requirement on new clustering algorithm to be both scalable and accura...
متن کاملScalable Clustering Using Graphics Processors
We present new algorithms for scalable clustering using graphics processors. Our basic approach is based on k-means, but it reorders the way of determining object labels, and exploits the high computational power and pipeline of graphics processing units (GPUs). The core operations in clustering algorithms, i.e., distance computing and comparison, are performed by utilizing the fragment vector ...
متن کاملKernel-Based Clustering of Big Data
There has been a rapid increase in the volume of digital data over the recent years. Analysis of this data, popularly known as big data, necessitates highly scalable data analysis techniques. Clustering is an exploratory data analysis tool used to discover the underlying groups and structures in the data. Stateof-the-art scalable clustering algorithms assume “linear separability” of the cluster...
متن کاملIntelligent scalable image watermarking robust against progressive DWT-based compression using genetic algorithms
Image watermarking refers to the process of embedding an authentication message, called watermark, into the host image to uniquely identify the ownership. In this paper a novel, intelligent, scalable, robust wavelet-based watermarking approach is proposed. The proposed approach employs a genetic algorithm to find nearly optimal positions to insert watermark. The embedding positions coded as chr...
متن کامل